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Abstract 

We compare classical and Bayesian methods for fitting the poisson distribution to the number of 
hurricanes making landfall on sections of the US coastline. 

1 Introduction 

The occurrence of a number of hurricanes making landfall on the US coastline in 2004 and 2005 has 
increased the level of interest in the question of how to estimate the risk of hurricane damage in different 
locations. There are many aspects to this problem, such as questions about the number of hurricanes 
occurring in different regions, their size, their intensity, their speed, the detailed structure of their wind- 
field, how they decay when they reach land, how much damage the winds will cause to structures, and 
so on. In this paper, we address one small part of one of these questions: how to estimate the distri- 
bution of the number of hurricanes that make landfall on any short stretch of coastline using historical 
hurricane data. This question itself has various aspects to it, such as to what extent we can extrapolate 
historical hurricane data in space and to what extent the changing climate is changing the distribution 
of the number of hurricanes. We don't attempt to deal with all these issues (at least not here and now). 
Rather, we focus on one very small part of the problem, which is to look at the most simple statistical 
methods that one might use to estimate hurricane landfall rates, ignoring the complex issue of climate 
change completely. Given the historical data on hurricane landfalls, we consider three basic methods 
one can consider using to estimate the distribution of hurricanes hitting a certain region: the first is the 
classical method of moments, the second is the classical maximum likelihood fitting procedure, and the 
third is a Bayesian prediction procedure. The point of this paper is to discuss these three procedures, 
and compare their pros and cons when applied to this particular problem. 

In section |21 we describe the question we want to address in more detail. In section [3] we describe the 
classical methods. In section 0] we describe the Bayesian method. In section [5] we perform a numerical 
comparison of the two methods. In section [B] we discuss one theoretical way in which these methods can 
be compared. In sectional we apply the two methods to observed hurricane landfall data, and compare 
the results using cross-validation. Finally in section |S| we discuss what we find. 



2 Setup 

We now give a mathematical summary of the problem we are considering. 

We consider a short section of the US coastline, and we wish to estimate the distribution of the number 
of hurricanes crossing this section of coastline. We have m years of historical hurricane data that we can 
use to estimate this distribution. In practice m has values of between 50 and 150 years, depending on 
the extent to which one is willing to use earlier and hence less accurate data. In our examples below we 
will use the 54 years of data from 1950 to 2003. 

During these m years of history there have been i hurricanes making landfall on our section of coastline. 
To illustrate possible values of i, figure ^ shows the numbers of hurricanes making landfall in each of 
39 arbitrary segments along the US coastline during our 54 year period. If we were to consider smaller 
segments, or more intense hurricanes only, then the values of i would be smaller. 
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We will assume that the distribution of the number of hurricanes making landfall is constant in time. 
This is probably not true: recent years appear to have seen an increase in the numbers of hurricanes, 
for instance. Our goal, however, is not to deal with that issue (we, and others, have discussed ways to 
address that elsewhere), but to look at the basic statistics of this problem. We will also assume that the 
number of hurricanes within a year is given by a poisson process. This is also probably not true, but 
again making this simplification allows us to look at and think about some important statistical issues. 
To summarise: given m years of historical hurricane data in which there have been i hurricanes, and 
assuming stationarity and the poisson distribution, how should we estimate the probability fin) of n 
hurricanes hitting this segment of coastline in a single year? Since we are assuming that / is a poisson 
distribution, f(n) is given by: 

/(») = — m 

where A is the real unknown expected number of hurricanes per year. 
For convenience below we will use the shorthand that: 

x e~ x X n 

g(X,n) = j- 2 

3 Classical methods for fitting the poisson distribution 

There are two classical approaches to this problem: method of moments, and maximum likelihood. We 
discuss them in turn. 

3.1 Method of moments 

Method of moments works as follows. We equate the observed annual mean number of hurricanes (which 
is ^) to the modelled rate A: 

A = - (3) 

m 

and we substitute this estimate A into equation ^ in place of the real A, giving: 

/(«) - (4) 

That's it: f(n) is then our estimate of /(n). 

3.2 Maximum likelihood 

Maximum likelihood works as follows. The likelihood is defined as the probability of the observed data 
given the model and the parameters of the model. Considering likelihood as a function of the parameter 
A, we vary A and find the maximum of the likelihood. The value of A that maximises the likelihood is 
then plugged into equation ^ 

Applying this in practice in our example, we note that we can consider a poisson with rate A for m years 
as giving the same distribution of numbers of events as a poisson with rate Am, once. 
The likelihood is thus given by: 

L(X) = B ~ Xm ^ m)i (5) 



Taking logs: 



logL (6) 
" e - Am (Am; 
. <! 

= — Am + ilog(Xm) — log(i\) (8) 



log 



(7) 



Differentiating this by A: 



Setting this equal to zero gives: 

A=i- (10) 
m 

and this is then substituted into equation ^ in place of the real A, giving 

e~*A™ 

/(») = — 

3.3 Discussion of the classical methods 

We see that method of moments and maximum likelihood give the same results. Do these two methods 
make sense? Both are based on the idea that, in order to model the distribution of hurricanes, we should 
choose our single best estimate of the rate A. Prima facie, this seems reasonable, but it can certainly be 
criticised. The main criticism is: why use only our best estimate? What about all the other estimates 
of the rate that one might make, which are not quite as likely to be correct? Given the small amount 
of data for hurricanes for short sections of coastline, it seems likely that there is quite a large range of 
reasonable estimates for A. Instead of using just the most likely of this range of estimates, perhaps we 
should use them all in some way. 

Another criticism is: consider the case where i = 0. In other words, we have a section of coastline where 
there haven't been any hurricanes in the last m years. What do the classical methods give us? Our 
estimate of the rate is A = — = — = 0, and hence our estimate of the distribution is f = - — 9— = 0. 
In other words, because we haven't seen any hurricanes in the last m years we conclude that it must be 
completely impossible that a hurricane might strike us in the future. This is clearly an illogical conclusion: 
there are many stretches of coastline that haven't experienced a hurricane strike in recent times, but for 
which, on a meteorological basis, it is quite clearly possible that they could experience such a strike. 
In fact, this second criticism is related to the first. If there haven't been any hurricanes in the last m 
years, it may well be the case that an appropriate conclusion should be that the most likely value of A 
is 0, but it is very clear that we also need to consider the (perhaps small) possibility that A is actually 
greater than zero. 

One final criticism of the classical methods is: note that the values of i and m only occur as the ratio — 
in these methods. The absolute value of m doesn't make any difference at all, and so there can be no 
accounting for the extra accuracy that extra years of data might bring (or vice versa) . 
These criticisms lead us onto Bayesian methods, in which we move beyond just considering the single 
most likely value of A, and consider a range of possible values. 



4 Bayesian methods for fitting the poisson distribution 

The Bayesian method we describe works as follows. Based on the discussion in the previous section, we 
will express our prediction for the distribution of the number of hurricanes as an integral over all possible 
hurricane rates that might have given us the observed data, rather than just the most likely rate. The 
prediction is then: 

f(n\i) = j f(n\X)f(X\i)dX. (12) 

The first term in the integral (/(n|A)) is the probability of n hurricanes, given a certain value of A. The 
second term gives the probability of each value of A, given the observation of i hurricanes in the last m 
years. We can think of this integral as a weighted average of predictions /(n|A), where f(X\i) gives the 
weights. /(A|z) is often known as the posterior density of A. 

How, then, can we calculate /(A|i)? Applying Bayes' theorem, we can factorise f(X\i) into: 



f(X\i) <x /(i|A)/(A) (13) 

The first term on the right hand side can be evaluated easily: it's just the probability density of the 
poisson distribution. 

The second term, known as the prior, needs a little more thought. If we have prior information on the 
distribution of possible values of A, we can use this prior distribution to include that information in the 
analysis. In this article, however, we will assume that we have no such prior information. Instead, we will 
try and choose /(A) so as to be neutral with respect to different possible values of A (such a choice is often 
known as a 'reference prior', or 'uninformative prior'). Unfortunately, there doesn't seem to be a single 



unambiguous choice for what this neutral prior should be. We have found the following 3 reasonable 
suggestions: 

• p(A) = c, a constant value, justified on the basis that this puts equal weights on all possible values 
of A 

• p(X) = cA~ 5 ; known as the Jeffrey's prior, and justified on the basis that it is invariant to changes 
in the scale of A 

• pW = § 

We can combine these 3 possibilities into one general form: 

/(A) = c\ a (14) 

where a has the values 0,— f/2or— fin our three cases. 
This then gives 

/(A|t) oc /(t|A)/(A) (15) 
oc cf(i\X)X a (16) 

We can calculate the constant of proportionality c using the fact that the integral of /(A|i) must be f for 
it to be a probability distribution: 

r f{X\i)dX = c J f(i\X)dX (17) 
Xm ^ Xa dX (18) 

-Am \i-\-a 



m l i\ 
c 



X l+a dX (19) 



ds 



(20) 



m l il J m 

+ (21) 

1 (22) 



implying that 



and hence the prior is: 



and the posterior distribution for A is: 



(23) 



/(A) = ^TT^T ( 24 ) 



(i + a)\ 

n 1+a i\ 
(i + a)\ 



m 



f(X\i) = —-— e ~ Xm { m Xy +a (25) 
(t + a)! 

= mg(Xm,i + a) (26) 

Note that this last line should not be read to imply that the distribution of /(A|i) is a poisson distribution, 
since now the roles of the parameter and the random variable are switched. In fact this is a gamma 
distribution. 

Now consider the case a = —I, i = Q. The posterior becomes: 

/(AW = 7CT)j e ~ Am (™ A ) _1 ( 2? ) 

This is not a proper posterior density, since the integral is not finite, so we reject the a = — 1 prior, 
leaving just the cases a = and a = —1/2. 

Figure [3 shows the posterior densities for m = 54 and i — 0,2,4,6, for both priors. We see that the 
differences are rather small. 



Where are the maximum values of these posterior distributions? Differentiating the prior wrt A gives: 



dp m «-i-« e - Am A !+a - 1 . , . 

3T = T- Ti i + a-Am (28 

dX (i + a)\ 

Setting this equal to zero then gives: 

A = I±2 (29) 
m 

For the a = prior, this gives A = i/m, which agrees exactly with the maximum likelihood estimate of 
A. This is, of course, because the posterior is equal to (or proportional to) the likelihood for a constant 
prior. 

For the a = —1/2 prior, this gives the slightly lower value of A = (i — l/2)/m. The use of a prior that 
weights towards A = has shifted the maximum slightly relative to the maximum in the likelihood. The 
implication is that this value of A is now the most likely (which we find slightly surprising) . 
What about the mean value of A under these posterior distributions? 



"' e- Xm (mXy +a XdX (30) 



(i + a)\ 

l+a+i 

■ x i +a +l e -Xm dx ( 31 ) 



(i + a) 



l+a+i r , s . i+ a +l ds 

e~° 



(i + a) 
1 (i + a + 
m (i + a)\ 



/_£_\ ^jf_ (32) 

\mJ m 

(33) 



For the a = prior this gives mean lambda = (i + l)/m, while for the a = —1/2 prior this gives the 
rather unpleasant looking mean = fepfyw;^- Note that both of these values are larger than the mean 
that comes from the classical analysis, which is i/m. This is because the uncertainty wrt the possibility 
of values of A greater than i/m adds more to the calculation of the mean than the uncertainty wrt the 
possibility of values of A lower than i/m takes away. This is especially noticeable for the case i = 0, 
where the two means are both above zero in the Bayesian case. 

Substituting the prior into our expression for the Bayesian forecast, equation^] gives: 



f(n\i) = I f(n\X)f(X\i)d\ (34) 
g(X,n)mg(Xm,i + a)dX (35) 



n\(i + a)\ 



I \ i+a+n+l 



e - {m+1)x X n+l+a dX (36) 
(i + a + n)l (37) 

(38) 



n\(i + a)\ \m + 1 
(I + a + n) ! / m \ I 1 



(i + a)\n\ \m + 1 J \m + 1 

This is now our estimate for the distribution for the number of hurricanes, based on Bayesian reasoning, 
and is, in fact, the negative binomial distribution. 



4.1 Discussion of the Bayesian method 

There are a number of interesting implications of equation 1381 

The first is simply that, to predict, or 'model' the poisson distribution, one shouldn't actually use the 
poisson distribution: one should use the negative binomial instead. 1 . This is because of parameter 

1 Although it is important to realise that the negative binomial in this case is fitted using i and m, and not using the 
standard methods for fitting the negative binomial such as method of moments, which would give different parameter values 
and would only be valid if we thought that the negative binomial, not the poisson, was a good model for the underlying 
data. 



uncertainty. The fact that we don't know the exact value of the parameter A, and that we represent 
that uncertainty using a distribution, converts the poisson into the negative binomial (even if we are still 
considering the events to be independent). 

The second interesting implication, already mentioned above, is that the mean of our forecast is now 
higher than for the classical case. The mean of the classical forecast (which is — ) arises simply because 
the mean is given by the value of the rate, and the value of the rate is chosen to be the maximum 
likelihood value of The Bayesian mean, on the other hand, arises through consideration of exactly 
what can be inferred from the observation of i hurricanes in m years. We feel that the Bayesian forecast 
mean is thus much more strongly justified than the classical mean. 

One particular case of interest is when i = 0: in this case the mean predicted number of hurricanes is 
created entirely from the possibility of the rate being higher than the most likely estimate of the rate. 
The fourth interesting implication is the variance of the forecast. 
The variance can be calculated as: 



variance = / , - ^ e- Xm (m\Y +a \ 2 d\ (39) 
(i + ay. 

x i +a +2 e -Xm dx ( 4 q) 



(i + a)\ 

m 1+a+i f _ s i s y+"+2 ds 

(i + a)\ J 6 \m) m ^ ' 

1 (i + a + 2)\ 



{i + a)\ 



(42) 



This is larger than the variance of the classically fitted poisson, which is and the ratio of the variance 
to the mean is not 1, as it is for the poisson, but is greater than 1. 

The fifth interesting implication is that we now have a non-trivial distribution for the possible number 
of future hurricanes, even in the case when i — 0, given by: 

, , x (i + a + n)\ ( m \ t+a+1 ( 1 \ n 
P(n\i = 0) = \, | (zrr-r) Iz—j) (43) 

(44) 



(i + a)\n\ \m + 1 
(a + n)\ f m 



{a)\n\ \m + 1 J \m + l 

So, interestingly, even if there have never been any hurricanes in the past, this method still predicts a 
non-zero probability for hurricanes in the future. This seems reasonable: since m is finite, we can't rule 
out the possibility that we haven't just been very lucky over the last m years and avoided any hurricane 
strikes. 



5 Numerical comparison 

In this section, we perform a numerical comparison of forecasts from the classical and Bayesian methods, 
for different values of i. We ask the question: how much difference does it really make to the final 
probabilities we predict if we use the Bayesian method? We have already seen that for i = there is a 
big difference between the two methods (the classical method predicts zero probabilities for n > 0, while 
the Bayesian methods predict non-zero probabilities) but what about for larger values of il. Figure EH 
shows predictive distributions from the classical and Bayesian methods, for m — 54, and values of i of 
1,5, 10 and 20. We see large differences between the classical and Bayesian methods for all the values 
of i tested. The Bayesian methods give a flatter, broader distribution, as expected, and the probability 
of extreme numbers of hurricanes is much higher. The differences between the two Bayesian methods, 
however, are much smaller, with the a = method giving a slightly flatter, broader distribution. 
At this point, we weigh up the pros and cons of the two priors that we are still considering, and make 
a decision as to which to use. In favour of the a = —1/2 prior, there is a piece of (slightly esoteric) 
theoretical reasoning related to the Fisher information and transformations of scale. In favour of the 
a = 0, we have that: 

• a flat prior in A reflects our desire to avoid having the prior influence the final result, while the 
a = —1/2 prior weights towards lower values of A 



• the flat prior gives an intuitively reasonable value of the most likely value of A, while the a = —1/2 
gives a most likely value that conflicts with intuition 



• the mathematics is simpler 

• the differences between forecasts made by the two priors is very small (relative to the difference 
between classical and Bayesian priors) 

• it is slightly more conservative (i.e. gives wider distributions) 

Based on this we conclude that we prefer the a = prior. 

For convenience, we now summarise the key properties of this prior. 

5.1 Summary of properties for the uniform prior 

prior = to (45) 
posterior = ^ e - Xm (mXY (46) 



(i + nV ( to V +1 ( 1 \ ™ 

predictive probability = — — (47) 

(i+)\n\ \m+lj \m + 1 J 

i + 1 

forecast mean = (48) 

m 

(* + !)(* + 2 ) , A n\ 

forecast variance = ^ (49) 

i 

maximum of the posterior = — (50) 

TO 

6 Scoring probabilistic forecasts 

We now change tack slightly, and consider how one might evaluate a prediction of the distribution of 
the number of hurricanes. The scoring system we will use is based on the out-of-sample log-likelihood , 
which we have previously discussed and used in a number studies, such as Ijewson and Penzerl (12006^1 . 
We believe this is the most obvious score to use for comparing probabilistic forecasts, and it is closely 
related to a large body theory concerning the cross-entropy and the Kullback-Leibler divergence. 
The out-of-sample likelihood is defined loosely as the expectation of the log of the probability of the 
observations given the forecast. This is only a loose definition because to make it precise we need to 
sp ecify what we mean by exp ectation, and there are several possibilities. This is discussed in more detail 
in l.Iewson and Penzerl ||200(tI) . In this section we will define the expectation as being over all future values, 
holding the historical data (the value of i) fixed, a nd over all possible values for the unknown parameters 
(this is the score S3 in l.Iewson and Penzerl l|20t)6l) \ 
For the classical prediction methods, the score is: 



J mg(Xm,i) ^^g(A,n) logg^,n^J 
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For the Bayesian prediction method the score is: 
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(59) 
(60) 
(61) 
(62) 
(63) 



We now plot these scores (the classical forecast score and the Bayesian forecast score) for m = 54 and 
for a few values of i, in figure^] We see that the Bayesian score always beats the classical score, but the 
difference between the scores reduces as i increases. 



7 Empirical comparison 

Finally, we compare the classical and Bayes prediction methods for real data. Our comparison is based 
on a carefully constructed cross-validation procedure (the Quenouille-Tukey jackknife) to make it as fair 
and realistic as possible. It works as follows: 

• we loop over the 54 years of data, missing out each year in turn 

• we fit the two models to the remaining 53 years of data 

• we calculate the log of the probability of the 54th year of data for both models 

• we average together the logs of the probabilities over the 54 years 

• we repeat this for each of our 39 gates along the US coastline 

The scoring system being used can be considered as an empirical version of the predictive log-likelihood 
score that we use in sectional where the expectation is now over the loop in the jackknife (which doesn't, 
however, correspond exactly to the definition of expectation used in section [SJ). 

Results from this comparison are shown in figure We see that the Bayesian method beats the classical 
method for each one of the 39 gates. The difference are largest for the segments where there are fewest 
hurricanes, as we'd expect. 

Finally, figure shows the actual probabilities predicted for the occurrence of 1 hurricane by gate for the 
classical and Bayesian methods. We see that the probabilities are mostly reasonable similar, but that big 
differences occur where there are very few historical hurricanes. 



8 Discussion 

We have discussed the question of how to predict the distribution of the number of hurricanes crossing 
segments of the US coastline. We have taken a very simple approach based on the assumptions that events 
are independent, and that the underlying poisson rates are constant in time. This simple framework 
allows us to compare classical and Bayesian statistical methods in some detail. We derive expressions 
for the classical and Bayesian forecasts, and further expressions for their performance under an expected 
predictive log-likelihood scoring system. At a theoretical level we find that the Bayesian method performs 
better. We then test our classical and Bayesian prediction methods on real hurricane data, using a jack- 
knife cross-validation scheme. We find that the Bayesian method does better in practice too, but that 



again the differences in the forecasts are only significant for those sections of coastline with very low 
historical hurricane rates. 

The next stage of our research is to compare these forecasts of landfall rat es with forecasts derived from 
the basin-wide track model we have described in a series of recent papers (|Hall and JewsonLl2005[) . 

References 

T Hall and S Jewson. Statistical modelling of tropical cyclone tracks part 6: non-normal innovations. 
arXiv:physics/0512135, 2005. 

S Jewson and J Penzer. Weather derivative pricing and the normal distribution: fitting the variance to 
maximise expected predictive log-likelihood. http://ssrn.com/abstract=911569, 2006. 




Figure 1: The observed numbers of hurricanes crossing 39 straight-line segments approximating the 
North American coastline, for the period 1950-2003. 





Figure 2: The posterior density for the poisson rate, given 54 years of data and 0,2,4 and 6 observed 
hurricanes making landfall over that 54 year period. 
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Figure 3: Predictions of future hurricane numbers, based on either 1, 5, 10 or 20 historical hurricanes 
making landfall in a 54 year period. The solid line shows classical predictions and the dotted line shows 
Bayesian predictions. 
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Figure 5: Performance of the classical and Bayesian prediction methods, evaluated using cross-validation, 
for the number of hurricanes crossing our 39 coastline segments. The score is the out-of-sample expected 
predictive log-likelihood. The top right panel shows the score for the classical prediction, the middle 
right panel shows the score for the Bayesian prediction, and the bottom right panel shows the difference 
between the two. We see that the Bayesian method wins for every segment. 
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Figure 6: Probabilities of one hurricane making landfall in a year, from classical and Bayesian predictions. 
The top panel shows the probabilities themselves (classical=cirles, Bayesian=squares). The middle panel 
shows the differences, and the bottom panel shows the differences divided by the classical probabilities. 
In the bottom panel values of 2 are actually infinity. We see that the Bayesian probabilities are all higher, 
although only by a small absolute amount. The relative differences are also small for most gates, except 
where there are either no historical hurricanes, or a very small number of historical hurricanes. 
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